Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

نویسندگان

  • Mathias Jacquelin
  • Lin Lin
  • Nathan Wichmann
  • Chao Yang
چکیده

We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv method, we compute selected elements of the inverse of a sparse matrix A that can be decomposed as A = LU , where L is lower triangular and U is upper triangular. Updating these selected elements of A−1 requires restricted collective communications among a subset of processors within each column or row communication group created by a block cyclic distribution of L and U . We describe how this type of restricted collective communication can be implemented by using asynchronous point-to-point MPI communication functions combined with a binary tree based data propagation scheme. Because multiple restricted collective communications may take place at the same time in the parallel selected inversion algorithm, we need to use a heuristic to prevent processors participating in multiple collective communications from receiving too many messages. This heuristic allows us to reduce communication load imbalance and improve the overall scalability of the selected inversion algorithm. For instance, when 6, 400 processors are used, we observe over 5x speedup for test matrices. It also mitigates the performance variability introduced by an inhomogeneous network topology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse nonsymmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L,U are lower and upper triangular matrices, and P,Q are permutation matrices, respectively. The PSelInv method computes selected elements of A. The selection is confin...

متن کامل

A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

An efficient parallel algorithm is presented and tested for computing selected components of H−1 where H has the structure of a Hamiltonian matrix of two-dimensional lattice models with local interaction. Calculations of this type are useful for several applications, including electronic structure analysis of materials in which the diagonal elements of the Green’s functions are needed. The algo...

متن کامل

Better Algorithms for Parallel Backtracking

Many algorithms in operations research and artiicial intelligence are based on the backtracking principle, i.e., depth rst search in implicitly deened trees. For parallelizing these algorithms, a load balancing scheme is needed which is able to evenly distribute the parts of an irregularly shaped tree over the processors. It should work with minimal interprocessor communication and without prio...

متن کامل

Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data applications, efficient parallelization of SGD is in the focus of active research. Recently, we were able to show that the asynchronous communication paradigm...

متن کامل

Parallel search algorithm for the detection of irregular structures

Search algorithms for the Detection of Irregular Structures are extremely diicult to parallelize eeciently, due to their non-local nature that makes load balancing a major problem. In this work some algorithms are investigated and implemented for Multiple Cluster and Single Cluster Search problems. A new (asynchronous) approach for the Single Cluster Search problem is also presented, giving a h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.04714  شماره 

صفحات  -

تاریخ انتشار 2015